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MARKOVIAN  SEQUENTIAL  CONTROL  PROCESSES— 
DENUMERABLE  STATE  SPACE  1 


Cyrus  Derman 
Columbia  University 


1.  Introduction 


As  in  [41,  [51.  [61  we  are  concerned  with  a  dynamic 
system  which  is  observed  periodically  and  classified  into 
one  of  a  number  of  possible  states.  After  each  observation 
one  of  a  possible  number  of  decisions  is  made.  The 
decisions  determine  the  chance  laws  of  the  system. 
Previously,  our  considerations  were  confined  to  finite 
state  spaces;  here,  we  allow  the  number  of  possible  states 
to  be  infinite. 

Let  I  denote  the  state  space  of  the  system. 
Throughout,  we  shall  assume  I  to  be  denumerable,  though 
with  suitable  modifications  our  theorem  below  remains 
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valid  for  a  general  state  space.  Whenever  the  system  is 
in  state  i  (i€l)  there  are  possible  decisions.  Denoting 
by  {-rt}  »  t=0 ,  1 , . . . ,  the  sequence  of  states  and  by 
{&t}  •  t=0,l,...,  the  sequence  of  decisions,  we  assume 
that 


00  p  {Yt+1  ”  3  I  Vl'  Yt  -  i-  °  k)  ” 

for  k»l,...,K^;  i,j  €1;  t=0,l,...  where,  for  each  t, 

s,  denotes  the  history  of  states  and  decisions  (i.e., 

st  *  {Y0  *  Yq»  &q  ~  *^0'  *••»  Yt  =  Yt*  ^t  =  rt}^  an^  ^e 

qi^(k)'s  are  non-negative  numbers  such  that 


£  -  1  »  k*l, . . .  .K^  i€I. 

j€I 


Roughly  speaking,  a  rule  R  for  sequentially 
controlling  the  process  is  a  well-defined  procedure  which 
specifies  the  decision  to  be  made  at  each  point  in  time 
as  a  function  of  the  history  of  the  system.  I!ore  precisely, 
we  say  R  is  a  set  of  non-negative  functions  yt)} 


3- 


where  for  each  t  (t=0,l,...)  the  domains  of  definition 
are  the  possible  values  of  st_^,  yfc  and  k  and  such  that 


I  D.  ( •  #  • )  *  1 

k  K 


we  define 

P  {  &t  =  k  1  st-i'  Yt  “  yt  }  *  Dk^st-1'  yt) 


for  all  k=l,...,Kyt,  st_i*  yt'  and  t=0,l,...  .  That  is, 
we  allow  decisions  to  be  made  by  a  random  mechanism,  the 
mechanism  used  to  depend  on  the  history  of  the  system. 

We  denote  by  ft,  the  class  of  all  rules  R.  Once 
initial  probabilities  P  |yq  «  i|,  ifl,  are  given  and  a 
rule  R€ft  is  specified,  the  sequences  {vt}  and  -[yt,  At| # 
t«0,l,...,  are  stochastic  processes.  We  shall  call  the 
process  a  Markovian  sequential  control  process. 

It  is  not  true  that  {Yt}  or  even  ^Yt,  At}  will  always 
be  Markovian;  whether  they  are  or  not  will  depend  on  the 
rule  R.  However,  we  use  the  term  Markovian  because  of 
assumption  M  which  imposes  a  kind  of  Markovian  structure 
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on  our  processes.  Such  processes  are  a  natural  outgrowth 
of  the  dynamic  programming  point  of  view  and  the  theory 
of  Markov  Chains.  They  were  first  discussed  by  Bellman 
(see  e.g. ,  [11  and  [21  and  also  [41,  [51,  and  [61  for 
other  references.).  Set 


Pt(j,k  I  i, R)  *  P(Yt  =  j,  4t  «  k  I  Yq  =  i,R) 

and  let  for  any  a,  0  <  a  <  1,  i€I 


♦  (i,a,R)  =  T  a1 
t=0 


I  Pt(j.k  I  i.R)  wjfc 
j.k 


where  {w^}  are  given  numbers.  t(i,a,R)  can  b®  thought 
of  as  the  expected  discounted  cost  over  an  infinite  horizon 
of  operating  the  system  using  rule  R,  given  that  i  is  the 
initial  state  and  w..^  denotes  the  cost  incurred  whenever 
the  system  is  in  state  j  and  decision  k  is  made. 

A  question  of  concern  is  whether,  for  any  given 
a  and  i,  there  exists  a  rule  such  that 


\(f(i,a,  Rn)  =  min  \jf(i,a,R). 
U  R€ft 


Conditions  v/ill  be  given  which  assure  existence  o£  such 
an  optimal  rule.  It  then  follows  that  there  is  a  non- 
randomized  stationary  rule  which  is  optimal  over  ft.  By 
a  stationary  rule  we  mean  ?  rule  such  that 


Dk<st-r  Yt  - 


I 


for  every  t=0. 1, ....  k=l, . . . , »  and  i€I.  A  non-randomized 
rule  has  its  D}c(.,.),s  either  zero  or  one.  Thus  a  non- 
randomized  stationary  rule  is  such  that  there  is  one 
decision  associated  with  each  state  and  that  decision  is 
made  each  time  the  system  is  observed  to  be  in  that  state. 


2 .  Existence  Theorem 


Our  result  concerning  the  discounted  cost  criterion 


can  be  summarized  as  follows: 
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Theorem:  If  <  »  for  each  i€I  and  {wj^}  is  bounded, 

then  for  a.  given  a(0  <  a  <  1)  there  exists  a  non- randomized 
stationary  rule  Rq  such  that 


$  ( i ,  a ,  Rn )  =  min  i|f(i,a,R) 
u  R€ft 


i€I 


Proof :  The  proof  will  fall  into  two  parts:  the  first 
to  show  the  existence  of  an  optimal  rule  and  the  second 
to  show  that  it  can  be  taken  to  be  a  non- randomized 
stationary  rule.  The  former,  following  the  remarks  of 
Karlin  [71#  involves  showing  that  ft  is  a  compact  space 
and  <jr(i,a.R)  is  a  continuous  function  over  ft.  The  latter 
makes  use  of  a  device  enqployed  by  Blackwell  [31  in  a 
similar  proof  for  the  case  of  a  finite  number  of  states. 

If  for  a  fixed  n  the  collection  of  non-negative 
functions  (. , . )}  is  rule  Rn€ft#  we  say  that 


lim  Ryj 

n-® 


R€ft  if  lim  D. 
n-® 


(n) 


(•#*)  *  DjJ.,.) 


where 


{Dk( * #  * )} 


is  the  collection  of  non-negative  functions  constituting 
the  rule  R.  In  the  following  we  arbitrarily  set 
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P(Y0  =  i)  =  Pi 


iei 


where  Bi  >  0,  and  ^  0^  =  1. 

i€I 

First  we  have,  as  pointed  out  by  Karlin  [71# 

Lemma  1.  If  K.  <  »  for  each  i€I,  then  R  jls  compact. 

Proof:  For  a  fixed  t,  st_^,  and  yfc,  the  space  consisting 
of  the  possible  points 


%  (wVl 


is  compact  since  Ky^  <  ®.  By  Tychonoff's  theorem  ([81 
p.  260)  the  product  space. 


0 


(t> 


TT 


D<t)(st-i’ytO 


is  compact;  and  again  by  the  same  theorem,  the  space 
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D  = 


is  compact.  However,  D  is  the  space  5?.  of  all  rules. 

Hence,  £  is  compact. 

Secondly,  we  shall  prove 

Lemma  2.  Under  the  conditions  of  theorem  1.  if  lim  R  =  R££ 

—  n-K,  n 

then  for  each  t ,  t=0,l,... 


lim 

n-«<» 


I  I  Pi  1 

i€I  j ,  k 


w 


jk 


■  I  l 

i€I  j,k 


Pt(j*k  |  i.R)  wjk 


consequently.  ^  $(i,a,R)  for  fixed  a(0  <  a  <  1)  is 

i€I 

continuous  over  £. 


Proof:  We  can  write  for  any  Rn»i,j,k  and  t 
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Mj'k  I  i-V  ‘  I  p(Vi-  Vk'  st-l  1  V1'  Rn> 


s 


t-1 


■  I  P(Vk  1  V1'  st-l'  Vj*  V  p(yt*j'  st-l  1  Vi,Rn> 

st-l 

’  I  Dk(n)(st-1' Vj)  p<Vj  1  V1’  »t-l*  Rn) 

et-l 

•pK-l  1  V1'  V 


I*  (n) 


s 


t-1 


k  (st-l'  V*>  ^t-1j  K-l>  p(8t-l  1  V1*  V 


I  Dk^n^st-1'  Yt*j>  qy  j^t-1^  DA^(st-2*  yt-l^ 

■t-1  t“1  t-l 


■  Vm,,m1  "•  %  (Ao)D4")(Yo’i)  • 
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However,  since  for  any  R€iR 


£  Pt(j,k  |  i,R)  =  1  . 


and  the  D,  s  converge,  it  follows  from  a  theorem 

due  to  Scheffe  [91  that 


lim  ?  P 


n-*«> 


j  *  k€E 


^.(j,k  |  i,Rn)  =  ^  P^.(j,k  |  i,R) 

j,k€E 


for  any  set  e  in  the  space  of  possible  states  and  decisions. 
However,  since  is  bounded,  the  lemma  follows  using 

standard  arguments. 

We  remark  that  since  J*  ^  ^iPt  (j,k  |  i,R)  wjk 

i€I  j,k 


is  bounded  as  well  as  continuous,  it  follows  that  for 
fixed  a(0  <  a  <  1) 
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Y  t|»(i#a,R)  =  Y  afc 

i€I  t=0 


l  l  Pi  Pt(3'k  I  ^  Wjk 

i€I  j.k 


is  also  continuous  over  A. 

Combining  lemmas  1  and  the  above  remark  we  have 
Lemma  3  s  Under  conditions  of  theorem  1,  for  a  given 
a(0  <  a  <  1)  there  exists  .a  rule  R*€S  such  that 


t(i,a,R*)  =  min  \|r(i,a,R)  ,  i?I 

R€ft 


Proof:  From  the  well-known  fact  that  a  continuous  function 
achieves  its  minimum  over  a  compact  space  we  have  from 
lemma  1  and  the  remark  after  lemma  2  that  there  exists 
a  rule  R*  such  that 


Y  Pi  t(i.a,R*)  = 
i€I 


min  7  <(f(i,a,R) 

R€*  i€I 


However,  suppose  that  p^’s  are  chosen  so  that  >  0, 
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i€l;  then  R*  must  be  as  asserted  in  the  lemma.  For  other¬ 
wise  we  could  construct  a  different  rule  which  would 

i€I 

We  now  proceed  to  the  second  part  of  the  proof 
of  the  theorem;  namely,  to  show  that  there  exists  a  non- 
randomized  stationary  rule  such  that 

i|r  ( i , a, ,  Rq  )  t  ( i ,  a ,  R* )  ,  i€I  , 


provide  a  smaller  values  of  ^  \jf(i,a,R) 


where  R*  is  as  in  lemma  3* 

Following  Blackwell  [31,  if  D  denotes  the  set  of 

numbers  jd^},  >0#  ^  d^k  =  1,  i€I,  then  let  R^  =  (D,R*) 

k 


denote  the  rule; 


Dk<Y0  ■  -  dik  •  k=l . Ki'  iSI  ■ 


followed  by  use  of  the  rule  R*  for  the  process 
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{yt_1  =  Yt  ,  At-1  =  At}'  t=l'***  •  More  generally,  let 
Rn  =  |d, ...,D,  R*j-  denote  the  rule: 

^  =  1 )  =  ^=1,*..»K^,  i€I ,  0  K  t  <  n  , 


followed  by  use  of  the  rule  R*  for  the  process  {^-n  “ 


Yt  ' 


At-n  =  At}'  t_n . 


Let  {<3ik}  be  chosen  so  that,  for  each  i€I, 


l  dik  wik  +  a  J-  «ij(k)  dik 

k  3 ,  k 


is  minimized.  Clearly,  the  minimizing  values  can  be 
taken  to  be  zero  or  one.  From  such  a  choice  of 
D  =  {dik},  it  is  easily  seen  that,  for  n=l,2,.... 


♦(i,a,Rn)  =  ifr(i,a,R*) 


i€I  . 
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Moreover,  lim  Rr  *  Rq,  the  non- randomized  stationary  rule 
n-*» 

with  =  D.  However,  by  lemma  2,  $(i,a,R)  is  con¬ 

tinuous  over  ft;  hence 


*(i,a,R0)  *  lim  ♦(i,a,Rn) 

n-*oo 


=  *(i,a,R*)  ,  i€I  . 


This  last  equation  establishes  the  theorem. 


Counter-Example 

There  is  no  difficulty  in  providing  an  example 
in  which  the  condition  of  finiteness  of  the  K^'s  is 
violated  and  the  conclusion  of  the  theorem  does  not 
hold. 

The  following  example  shows  that  the  theorem 
may  not  hold  if  the  boundedness  condition  on  {wj^|  is 
weakened.  Let  I  consist  of  the  states  0,  la,  1^,  2a# 
2^,...  and  suppose  there  are  two  possible  decisions 
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at  states  1  ,  2  .  . . .  and  only  one  possible  decision  at 

a  a 

2,  .  Assume  that 


states  0,  1^, 


90(1) 


i=l,2, ... 


gia(i+l)a^l)  =  P  '  qia0(1)  "  1_P  '  1=1,2 . 

0  <  p  <  1  , 


(2)  =  1 


l  1  ,  2 ,  »  »  ■  * 


(op)1'1 


i=  1 1 2 » •  •  • 


1 

(2ap)i-1 


i=l,2, . . . 


Let  P(Y-*  1  )  *  1.  If  R  denotes  the  rule:  Make  decision 
u  a  n 

1  for  all  t  <  n;  if  Y_  *  (n+1)  make  decision  2  at 

Ii  a 

t  =  n.  Then,  on  computing  i|»,  we  get 
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*(ia,a,Rh)  =W1  1  +  apw2  1  +  •**  +  (aP)nw/n+l)  2  + 

a  a  '  “'a 


<“P)n  ~T^r  w(n+l),  1 


=  -  [  (i  +  -|-+  ...  +  -i-)  + 


Thus  lim  ^(1  ,a,Rn)  =  -<».  However,  every  R€5l  will  clearly 
yield  a  finite  value  for  $(1  ,a,R).  Thus  no  optimal  rule 

a 

exists. 


4.  Remarks 


Of  interest  are  conditions  under  which  the  assertion 
of  the  theorem  holds  when  ijr  is  replaced  by 
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When  the  limit  exists  this  is  usually  referred  to  as  the 
average  cost  per  unit  time.  It  was  shown  in  T4T  that 
the  theorem  holds  when  I  is  finite.  However,  a  proof  in 
the  denumerable  case  has  not  been  given  and  it  is  not 
entirely  clear  that  it  is  true,  not  withstanding  the 
usual  intuitive  arguments. 

For  I  finite,  Blackw.ll  ("3^  obtained  a  stronger 
result.  He  showed  there  exists  a  non- randomized  stationary 
rule  Rq  such  that 

A(i,a,Rn)  =  min  #(i,a,R)  ,  i€I 

u  R€ft* 


for  every  a  near  enough  but  less  than  one.  ft*  is  the  class 
of  all  rules  whose  decisions  at  time  t  depend  only  on  the 
state  and  t.  However,  from  the  above  result  it  is  clear 
that  ft.*  can  be  replaced  by  ft.  A  counter-example*  appears 
in  the  doctoral  thesis  of  Ashok  Maitra  (Department  of 
Statistics,  University  of  California,  Berkeley)  indicating 
that  the  result  does  not  extend  to  the  denumerable  case. 
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